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(54) Collaborator discovery method and system 

(57) A collaborator discovery method and system 
that tracks and correlates user activities in accessing a 
variety of information resources over an electronic net- 
work to allow users to determine other, users with com- 
mon interests is presented. The collaborator discovery 
method and system includes a monitor to track user 
activities, an entry processor to update and to provide 
monitored activities to a match database, and a matcher 
to correlate the user activities and to diffuse particular 
users' interests into information resources they have not 
yet visited. The method and system tracks the long-term 
and short-term interests of users utilizing a method 
roughly analogous to following a user's trail of informa- 
tion access through an electronic space such as the 
Internet, and decays the level of interest for information 
resources not recently visited. Also, the method and 
system provides a means for pruning information 
resources after their associated level of interest have 
become sufficiently decayed, thereby clearing the 
match database of unnecessary entries. Furthermore, 
the method and system provides a both a means for 
interaction among users in the form of a messaging sys- 
tem, and a means by which users may maintain ano- 
nymity with respect to other users. 
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Description 

BACKGROUND OF THE INVENTION 

5 (1) Field of the Invention 
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[0001] The present invention is related to a method and an apparatus for computer networks such as the Internet 
w,de area networ^ (WANs), metropolitan area networks (MANs), and local area networks (LANs). More ^ft 
Pjdes a method and an apparatus that allows for correlating interests of people by tracking and 
actions ,„ a computer environment or based on their actions with respectto items cataloged in a computer enLnmem^ 
(2) Background of the Invention 

EL ™! lnte T et C ° nneCtS ,housands ° f individuals as well as many disparate networks across the world in 
•riurt.es such as education, military, government, research, and others. The Internet utilizes transmission con™ pro 
toco^lnternet protocol (TCP/IP, as a standard tor transmitting information. An intranet « a local area netwo k su^X 
lr ou t Z? r h 6Xa T le 3 C ° mPany ° r ^ edUCati ° nal inSfflUti0n - Thrau 9 h an ^et, ^y partake in 

20 source of nfo^afion Z T^ T * beC ° min9 * m * r "«»"*«» channel 38 we » as a ™*>r 

lnformat,on and e - ma " becoming a major means of communication among the population 

S !T USe k 'I! 6 COmputerized nature of the "^net and other networks, a rich source of tracking data is 
avertable wh,ch may be beneficial^ correlated. Through the use of various networks, people are able to commurfca e 
as well as to search for information from various sources such as web sites. The networking environment prc^e^ 

ZTt*TT any ,T nin9fUl W ° rk - related interaCti0n amona — * Promo eTe^ra, 

on rt is desirable to correlate certam user history or access data and to make the correlation results available to other 

TwlalT^ TT ^ 3 m6th0d ^ aPParatUS ,0r US6rS COnnec,ed t0 a network to — informin 

Z IJ T , hlSt ° rieS COrrelate ' and t0 provide the users with data t° allow them to determine 

others with similar, common interests. "^ermine 
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KJLfan Pn0r art ', meth0dS nave been devel °P ed for identifying people with common interests for the purpose 
of p ov dmg recommendations regarding various information sources. Systems of this type typically require expW 
input .« current user preferences in orderto predict futura preferar^es thraugh computer ana^Vfe of ^Zt.n^me 
■TTtSET T SyStem deVe ' 0Per 10 Ca,690riZe - i,emS int0 P-determfned classes. Therefora 
accessed ^ 1 h h f ^T* d0eS ^ C ° mpUter ana,ysis of the conten t of information sources 

accessed, and wh,ch does not require that information be pre-categorized in any way. These characteristics are oartic- 

ino^ 

T 3 lySB °' V ' de ° ° r 3Udi0 S ° UrCeS Can be far more difficu, t t^n it is with text-only sources 
SnrSL. T '" ethod ' or inf ° rr "ation filtering is known as collaborative filtering. Instead of attempting to anaryze 
or maoa7 T ° n A keyW ° rdS ° r Con,ent ' collaborative filtering techniques transform each user into the ro'e of a cSc 
Iv Tf ! drt ° r A T 9 " 6n ' ndiVidUal iS Capab ' e ° f deciding what they ,ike °r d « ,ika - a " d tether the informal 
they are look.ng at ,s relevant to their current interests or needs. The user meraly has to organize and rank the info^a 
tion he or she sees ,n terms of his or her own personal evaluation criteria. If a number of users have similar eva uaTon 
criteria, shanng the results of their evaluations can provide each user with the benefits of exposure t Tmuch b reader 
range of re evam information. In this way, each member of the group serves as a 'recognition engine" to denffly and 
T T* ^ aPPr ° Pria,e t0 SharS Wit " ° ther usere - Because ^is evaluation is performed by 

b^h^ 

ESL ^ 6 C ° nCe , Pt ° f CO,laborative fitterin 9 has b een likened to the notion of automating the "word of mouth' proc- 
simi,r r trl S t°K "I" 9 ,riendS and C0 " eagUeS - USUa " y Pe ° ple kn0W Which ° f th-r friends or associates Ze 
^ ^ ^ , ^ eXamP,e • Whe " Ch0OSing 3 m0Vi6 ' Pe ° ple Wi " most o ft an ask the opinion of others who 
own w TcZ If S ' m : ,ar t t ° h their T A commendation from someone who is known to have similar tastes to our 
own will carry far more weight than one from another source. 

or ^LJr^TvV itte ; in ^ echni o u as, user groupings are dynamic and may change as rapidly as users' needs 
oalTand Lin. 9 C0 " ab ° ratlVe fllterina tech niques take advantage of the fact that there are thousands of users, both 

SZ ?£TiS2iEy accessed a , broad range of dtfferent items ' and each having opinions about the informa - 

oeollho^ , ? eroer ' S ^ 

Ler l JrH o^r edS ° r,n,erests - ln order to benefit from their collective opinions, these users need not have 

Z. th h ' ' ° r S6en 6aCh ° ther ' and th6y may 6Ven be ,ocated at opposte ends of the world. All that matters is 
that they have g.ven similar ratings to many of the same sources of information. These ratings alone can then beappfed 
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to suggest to a user new sources of information that he or she has not yet seen. 

[0008] US patent number 4,996,642, entitled "System and Method for Recommending Items" and its related patent, 
US patent number 4,870,579 describe a recommendation system that uses collaborative filtering techniques. This sys- 
tem relies on explicit user ratings of items in order to perform clustering of users according to their common likes and 
5 dislikes. Furthermore, the output of this system is not intended to help people identify others like themselves, but to pro- 
vide specific recommendations about items they may wish to use, rent, or purchase. 

[0009] The present invention differs from these patents and methods in that it does not rely on explicit user input 
such as ratings. Because the system is intended to primarily match users with common interests rather than to provide 
recommendations to those users, the system does not require input about user opinions regarding information 
10 accessed. Instead, it is able to make use of data about users' patterns of information access and their modes of use of 
the information once it is accessed. 

[0010] US patent number 5,870,744, entitled "Virtual People Networking" describes a system which allows multiple 
people working for the same organization with similar interests to automatically interface with each other when any one 
of the people accesses any given one of multiple electronic sites provided through an intranet of the organization. The 
75 system described tracks a user's access pattern and provides the access pattern to other users upon request The sys- 
tem also allows users to explicitly rate particular sites and to provide messages regarding a particular site to subse- 
quent users who view their access patterns. 

[001 1] The present invention differs from this patent in that it does not simply provide user access patterns to other 
users. Rather, it correlates user access data and implicitly determines content similarity of sites through an analysis of 
20 access patterns. Furthermore, it provides an implicit interest rating system based on the number of times an individual 
user accesses a particular site. The rating system also takes into account the passage of time through the use of a 
decay factor, which degrades the determined user interest in a particular site overtime. 
[0012] Further references: 

25 Goldberg, David et al., "Using Collaborative Filtering to Weave an Information Tapestry," Communications of the 
ACM, December 1992, Vol. 35, No. 12, pp. 61-70. 

Maes, R (1994) Social interface agents: acquiring competence by learning from users and other agents. In Working 
Notes of the AAAI Spring Symposium on Software Agents, Stanford, CA p. 71-78. 

Shardanand, U., and Maes, R, (1995) Social Information Filtering: Algorithms for. Automating "Word of Mouth," 
30 appearing in CHI-95 Conference, Denver, CO. May 1995. 

SUMMARY OF THE INVENTION 

[0013] In accordance with the present invention, a collaborator discovery method and system are presented. The 

35 method provides for collaborator discovery among a plurality of users, and generally includes the steps of: (a) providing 
a user history including a plurality of entries, with each entry including a user identity associated with each particular 
user and a reference to a particular item accessed by that user; (b) associating particular items in the user history by 
providing a measure of similarity between the particular items; (c) uniquely associating at least one scent score to each 
particular item accessed by a particular user; (d) diffusing the at least one scent score associated with a particular item 

40 accessed by a particular user to another item by generating at least one diffusion scent score from the combination of 
the measure of similarity between the particular item and the other item and the at least one scent score, and incorpo- 
rating the at least one diffusion scent score into the at least one scent score of the other item; (e) repeating step (d) for 
all items which have at least one scent score; and (f) determining scent match scores by correlating the scent scores 
from all of the particular items to find users with common interests. The user history may be generated by monitoring 

45 and recording the real-time accesses of the plurality of users, and steps (b) through (f) may be repeated a plurality of 
times to provide a continual update of the scent scores. The measure of similarity may be generated in a number of 
ways and based on a number of factors such as the temporal proximity of accesses between particular items. The scent 
scores may be increased overtime in proportion to the number of times a particular item is accessed in order to provide 
a measure of a user's interest in the item. Particular items, such as large, general interest Internet search engines or 

so other items which are likely to be accessed frequently, but that are likely to yield little useful information regarding user 
interests, may be filtered out of the user history. After the user scent scores have been correlated, this information may 
be provided to the users in order to assist them in finding others with similar interests. To account for the difference 
between short-term and long-term user interests, different scent scores may be utilized with different rates of increase 
in order to help differentiate between users sharing only a passing, short-term, interest and those with similar long-term 

55 interests. The scent scores may also be decayed in order to account for changes in user interests overtime. A messag- 
ing system such as a chat facility or an e-mail system may be provided to enable users to communicate with each other, 
and privacy enhancements may be added to provide for user anonymity. 

[0014] The system of the present invention includes a monitor which provides a user history, with a plurality of 



3 



EP 1 094 404 A2 



entries, each including a user identity associated with a particular user and a reference to a particular item accessed 
by that user. The monitor may be centralized or it may be distributed among the users' systems, or it may be a hybrid 
of the two. An entry processor is connected to the monitor to receive the plurality of entries of the user history from the 
monitor, and is operative to associate pairs of particular items in the user history to provide a measure of similarity for 
each pair, and to uniquely associate at least one scent score for each particular item accessed by a particular user. A 
match database is connected to the entry processor to receive and store the measure of similarity and the scent scores. 
A matcher is connected to the match database to receive the measure of similarity and the scent scores, and to diffuse 
the scent scores to other items in the user history in proportion to the measure of similarity and to correlate the scent 
scores of all of the particular items in the user history to determine users with common interests. The user history may 
be generated by monitoring and recording the real-time accesses of the users. The system may thus provide a contin- 
ual update of the scent scores. The measure of similarity may be generated in a number of ways and based on a 
number of factors such as the temporal proximity of accesses between particular items. The scent scores may be 
increased over time in proportion to the number of times a particular item is accessed in order to provide a measure of 
a user's interest in the item. A filter may be provided to eliminate from the user history particular items, such as large, 
general interest Internet search engines or other items that are likely to be accessed frequently, but that are likely to 
yield little useful information regarding user interests. After the user scent scores have been correlated, this information 
may be provided to the users in order to assist them in finding others with similar interests. To account for the difference 
between short-term and long-term user interests, different scent scores may be utilized with different rates of increase 
in order to help differentiate between users sharing only a passing, short-term, interest and those with similar long-term 
interests. A decay engine may be provided to decrease the scent scores in order to account for changes in user inter- 
ests over time. A means for messaging such as a chat facility or an e-mail system may be provided to enable users to 
communicate with each other, and a means to provide user anonymity may be provided to allow for user privacy. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0015] 

FIG. 1 provides a system overview of an embodiment of the present invention demonstrating the relationship 
between the major components; 

FIG. 2 provides a system detail of an embodiment of the present invention demonstrating the components of the 

entry processor, the match database, and the matcher; 

FIG. 3 provides an example item index table component of the match database; 

FIG. 4 provides an example linkage table component of the match database; 

FIG. 5 provides an example hit table component of the match database; 

FIG. 6 provides an example correlations table component of the match database; 

FIG. 7 is a system overview illustrating forward and backward scanning, privacy enhancements to enable users to 
maintain anonymity, and the system's relationship with external resources; and 
FIG. 8 is a flow chart generally representing the steps of the present invention. 

PETAJLEp pfscrjption 

[0016] The present invention is useful for determining potential collaborators by monitoring their information gath- 
ering and organizing activities, and may be tailored to a variety of applications. The following description is presented 
to enable one of ordinary skill in the art to make and use the invention and to incorporate it in the context of particular 
applications. Various modifications, as well as a variety of uses in different applications will be readily apparent to those 
skilled in the art, and the general principles defined herein may be applied to a wide range of embodiments. Thus, the 
present invention is not intended to be limited to the embodiments presented, but is to be accorded the widest scope 
consistent with the principles and novel features disclosed herein. 

[0017] An object of the present invention is to help people find others who might be well suited as collaborators 
because of an apparent commonality of interests. It is a further object of the present invention to identify potential col- 
laborators on the basis of passively acquired data about individuals' habits with regard to what information they access 
and how they organize such information for their own use. In this description, as well as throughout the remainder of 
this patent, the phrase "passively acquired" is used to indicate that the data about people's habits is not obtained 
through any form of explicit questioning of the individuals involved. Instead, all data is to be acquired as a byproduct of 
people's ordinary information gathering and organizing activities so as to minimize the impact the system has on peo- 
ple's time and attention. It is an additional object of the present invention to analyze patterns of information access with- 
out regard to the specific nature or content of the information being accessed. All that is needed is a unique information 
identifier to distinguish each item, such as a uniform resource locator (URL) in the case of the World Wide Web. 
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[0018] Human patterns of information access can reveal a great deal about user interests. As a consequence, a 
commonality of access patterns between two or more individuals can reveal a commonality of interests between the 
individuals. For example, if several people read the same articles in a magazine or view the same pages on the World 
Wide Web, or call the same phone numbers, there exists a possibility that these people have some interests in common. 

5 Furthermore, the possibility that these people have interests in common increases as the number of items they have 
accessed in common increases. The likelihood that two people who have accessed items in common also have partic- 
ular interests in common further increases the more the items accessed tend to be highly specialized and rarely 
accessed by others. The fact that two people have both read the same lead article in a major newspaper or have both 
accessed the home page of a major Internet search engine says relatively little about any common interests they may 

10 have. On the other hand, should these same two people both have visited an obscure web page on a rare topic, then 
they are more likely to have some common interests of a very specialized nature. 

[001 9] The fact that an item accessed may actually fail to satisfy a person's intended purpose or need for accessing 
that item does not deter from the usefulness of that access event as an indicator of the person's interests. If the person 
felt there was something worthwhile in that item either because of how it was referenced, because of a recommenda- 
15 tion, or because of its title, the fact that the person thought the item might conform to their interests means that others 
with the same interests might do the same. 

[0020] Often, even people with a significant degree of interests in common may not access exactly the same items. 
Because of this fact, another object of the present invention is to provide a means to infer similarity between items 
accessed by observing access patterns over time. While other approaches have concentrated on content analysis as 

20 a means for determining similarity, the present invention exploits the fact that human information access tends to follow 
a continuity of interests, rather than jump between discrete pockets of diverse interests. Therefore, items accessed in 
succession by a user can be grouped as having some minor degree of similarity. This measure of similarity between 
items can be enhanced with repeated association of items either by the same user or different users. 
[0021] The present invention generally involves several steps. First, an activity history profile is developed for each 

25 user by classifying and recording their activities of accessing information sources by associating a unique "scent score" 
scalar value for each information item for each accessing user (scent scores will be discussed in more detail further 
below). The scent score scalar value may be increased by various activities taken by the user. For example, an 
accessed object's scent score associated with a particular individual may be incremented each time the user accesses 
the object. Further, other activities such as a user storing the object's location by means of a bookmark file or other sim- 

30 ilar utility may be used to further enhance its scent score for that user. The scent scores for all objects may be decayed 
as a function of elapsed time and their current values. This decay may follow any desired function, and may take the 
form of a linear degradation, half -life type degradation, or any other suitable form of degradation. Second, each 
accessed information item is associated with a second scent score scalar value for each accessing person. The same 
increasing and decaying operations are applied as were for the first scent score, except that the increasing and decay- 

35 ing are performed in smaller amounts. For purposes of this description, the first scent score may be thought of as a 
short-term scent score because it is subject to greater fluctuation from recent activities than the second scent score, 
which may be thought of as a long-term scent score. Although two scent scores are utilized for this description, the 
number and type of scent scores generated for a particular embodiment may vary depending on the specific applica- 
tion. Third, a linkage value is assigned for various pairs of items accessed. This value is determined based on factors 

40 such as sequential access patterns of individual users, user-determined groupings of accessed items such as place- 
ment of items into a collection such as a folder, items themselves including reference to the other, associated item, and 
both items being referenced by a third item. Fourth, the first and second scent scores from items accessed by a user 
are propagated to related items according to the linkage values which act as weights, linking various items by degree 
of similarity. Fifth, match scores for pairs of individuals are obtained using the correspondence between their scent 

45 score scalar values. Sixth, the scent scores for each item may be decayed and removed from long-term storage when 
the scent scores have become sufficiently small. 

[0022] The present invention preferably operates on the Internet, where users access information items such as 
World Wide Web pages available from a virtually unlimited number of sources. However, it may also reside on smaller 
networks such as corporate intranets, or within a particular website, or it may be used in conjunction with an item check- 
so out system such as are commonly used in libraries. 

[0023] A general overview of the major components of the present invention is shown in FIG. 1. The system 
includes a monitor 100, an entry processor 102, a match database 104, and a matcher 106. A match server 108 pro- 
vides a system through which a variety of users 1 10 may interface with the match database 104 in order to determine 
those with interests similar to theirs. The monitor 100 is used to track the activities of the plurality of users 1 10 as they 
55 access various information resources available through the system. Its primary function is to collect information about 
user activities, for example their web-browsing sessions and their organization of particular items on their computer 
desktop, in bookmark files, and in folders. Depending on the needs of a particular system, the monitor 1 00 may be cen- 
tralized so that all access requests must pass through it, or it may be distributed so that each user's system tracks the 
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user's acuity locally, or the monitor 1 00 may be a hybrid mixture. The monitor 1 00 provides user activity information to 
the entry processor 1 02. which receives the information and selects portions that are relevant for matching user inter- 
ests. The entry processor 102 then inserts the relevant ttems into the match database 104 and maintains consistency 
between the new, incoming information, and the older, previously stored information. The entry processor 1 02 creates 
s scent scores in the match database 1 04 corresponding to items that have been accessed and also provides linkage 
information between items that have been accessed. This information is inferred by user activities and includes such 
factors as time delays between the user's access of each information item in a series of information items The matcher 
106 interacts with the match database 104 and its activities may be summarized as follows: (1) rt receives a measure 
of similarity and scent scores and diffuses them to other items in the user history in proportion to the measure of simi- 
10 larity, and it (2) correlates the scent scores of all of the particular items in the user history to determine users with com- 
mon interests (the scent score, decay, linkage, scent score diffusion, and scent match score generation will be 
discussed in detail further below). As previously stated, the match server 108 provides a means of interface for the plu- 
rality of users 1 1 0 that enables them to access information about their similarity to other users. The exact user interface 
may vary from application to application and may take forms such as lists of users with similar interests or a graphical 
,s interface with spatial relationships indicating degrees of user similarity. Additionally, the user interface may allow a par- 
ticular user to determine the similarity of any user to any other user. Further, FIG. 1 demonstrates a message server 
1 12. wh.ch may be provided to allow interaction between users. In general, users 110 utilize the system to identify oth- 
ers having common interests and information resources that may be of interest. With the message server 1 12 users 
may also contact each other to discuss items of interest or for other purposes. 

[0024] The residence of the various components of the present invention may be chosen as necessary for a partic- 
ular application. For example, the monitor 100 and the entry processor 102 may be designed to reside on the client 
computer of a user 1 10 in such forms as an independent software application or an Internet browser plug-in or may 
alternately reside on a proxy machine along with other components of the system. Thus, each user may have a monitor 
and an entry processor resident on their system. Alternately, other hybrid configurations may be developed The exact 
configuration of the system components may be selected to meet the needs of a specific use, and is not intended to be 
limited to the specific embodiments described herein. 

[0025] It is important to note that user activities, for purposes of the present invention, include any information gath- 
ering or organizing activity undertaken by a user and accessible by the monitor 100. Some of the activities as men- 
tioned before, include a user's web-browsing activity and their organization of items on their computer desktop or into 
files. The list of possible activities that may be beneficially monitored is expansive, and is certain to develop as different 
methods of information organization arise. Thus, the specific method of organization is not critical for the present inven- 
tion. 
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[0026] Next, a more detailed discussion of scent score generation and decay, linkage generation, scent score dif- 
fusion, and scent match score generation is presented. 

f 1 ) Scent Score Generation and Decay 

[0027] A simple model of the relevance of each object accessed to an individual user's interests is established by 
associating two unique scalar values to each item accessed by each user. These scalar values are referred to as a 
user's "scent score" for a particular item because they are intended to emulate trails left behind as a user travels through 
an information space. When a given entry is processed, a database entry is made which associates the item and user 
with two scalar values, a first scent score, termed a long-term scent score (SL) and a second scent score, termed a 
short-term scent score (SS). 

[0028] If an entry already exists for the given item and use pair, then the two scent scores are updated as follows: 

SL = SL + (1-SL)*KL 
SS = SS + (1-SS)*KS 

[0029] Where KS and KL are chosen as either constants or may be equations such that KS > KL This causes the 
value of SS to rise faster than the value of SL Other update schemes are also possible so long as the scent score sca- 
lars for a user at a given rtem increase to some degree with each time the user visits a given item and are subject to a 
certain limit to the total amount of the increase over time. 

[0030] If an entry does not already exist for the given item and user pair, then a new entry is created, and initial val- 
ues of SL and SS are established as follows, with CL and CS representing constant initial values for SL and SS: 

SL = CL 
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ss = cs 

[0031] A similar procedure may be performed for each entry in a bookmark file, with the only difference being that 
a larger value may be used for the constants KL and KS in order to signify a greater level of significance to items that 
5 have been saved as bookmarks as opposed to merely having been visited. Similarly, different values may be assigned 
to other information organizing activities such as the arrangement of items on a computer desktop or the downloading 
of files from a site. Further, although two types of scent scores have been discussed herein, the number of types of 
scent scores utilized may be selected for optimal performance for a particular embodiment. 

[0032] While the scent score associated with a user at a web page increases with each visit, it also decreases over 
w time. This decay prevents all items from ultimately moving to the maximum scent score intensity level. It also allows the 
scent score information to better reflect recent user interests. Just as the long-term scent score increases more slowly 
than the short-term scent score, long-term scent score also decays more slowly than short-term scent score. The peri- 
odic update is established as follows: 

15 SL = SL*DL 

SS = SS*DS 

[0033] Where DS and DL are chosen as either constants or may be equations such that DS < DL This causes the 
20 SL values to decay more slowly than the SS values. In general, logs are acquired over time, with time-stamped entries. 
Therefore, the decay function can be performed at regular intervals in accord with times of log entries. However, the 
decay function is optimally performed after one or two scent score propagation steps have been performed, as will be 
further described. It is important to note that various decay schemes may be used depending on the requirements of a 
specific application. 

25 

(2) Linkage Generation 

[0034] The linkage is a measure of similarity between different web pages. This measure is generated to capture 
the notion that a user's interest in one item should be reflected in related items. One means by which this may be 

30 accomplished is to consider the sequence of items visited by a user as an indicator of similarity. Thus, if a user access 
one item and then another item within a short period of time, a linkage association may be established between the two 
items. This method is driven by the idea that people tend to follow a line of thought and that their interest in a particular 
topic will be present over a period of time during a given information gathering session. The degree of linkage estab- 
lished by this means may be either a constant within a fixed time threshold, or it may be made as a function of time 

35 between access events. Other means, for example using groupings established from user bookmarks, may be used. In 
this case, if several item references have been placed within a common bookmark folder, then these items may all be 
associated with one another Alternately, a single new reference may be created to represent the folder itself, and all 
items within the folder may be linked to the folder reference. Another means for identifying item similarity is by reviewing 
the links to other items contained in an item. In essence, any item can be said to have some degree of similarity to any 

40 item that it references. Conversely, any item that is referenced by other items can be said to have some degree of sim- 
ilarity to the items that reference it. By using search engines, indexes, or other information sources that may be found 
on a network or on the Internet, it is possible to obtain a list of items that reference a given item. This method is used 
to find items that reference a item that has been accessed by a particular user. There are many other methods by which 
similarity may be determined. For example, in Internet search engines, similarity is determined by factors such as com- 

45 mon occurrences of various keywords within text documents, by the titles of links within a page, and by the filenames 
of graphics and other files associated with a page. Any such method may be applied in order to determine and update 
the similarity measure between items. 

[0035] In the case where sequential access to items is used in the generation of a linkage measure, the measure 
is determined using an associative reinforcement algorithm. Each time two items, A and B, are accessed in proximity 
so to one another, the linkage measure L AB is updated, where L' AB is the updated linkage measure, as follows: 

L'ab^Lab+O-Lab)*^) 

55 where k(t) < 1 . 

[0036] The value of k(t) is the incremental update factor for associating item A to item B where t represents the time 
that has elapsed between a user accessing item A and then item B. In general, the value of k(t) decreases as the value 
of t increases from zero. Also, for each forward association created from item A to item B, a reverse association from 



7 



EP 1 094 404 A2 

item B to A may be created as follows, where L' BA is the updated association value: 

LBA = L B A + (l-L B A)*ak(t) 

5 

where k(t) < 1 
and a < 1 

[0037] In general, this reverse association will be made weaker than the forward association by use of a value of a 
that is less than one. A result is that the similarity measure between any two items will not necessarily be symmetric. 

10 [0038] When other methods for determining similarity between items are used, they are combined with the similar- 
ity measure obtained from sequential access. In this case, a similar form of reinforcement update is used, except that 
the update factor k(t) is replaced with a value 0 * S where S is the similarity measure calculated by whatever means 
chosen, and p is a constant used to indicate the significance of the source of the measure. For example, P will be larger 
for similarities obtained from user bookmark folder groupings than for similarities obtained from references contained in 

15 documents. 



(3) Scent Score Diffusion 



[0039] Scent scores are dispersed from items a user has visited to other similar items through diffusion and decay 
processes. The diffusion process uses the web page similarity measures as a means to determine which pages are 
adjacent. Given a user's scent score with intensity SS A and SL A at item A, and intensity SS B and SL B at item B, then 
the proximity from item A to item B, P AB is used to update the user's scent score at item B as follows, where the prime 
symbol indicates the updated value: 

if SS A > SS B : SS'b = SS B + (SS A - SS B ) * Lab * r 
if SL A > SL B : SL'b - SL B + (SL A - SL B ) * Lab * r 



[0040] Where the term r is used to determine the general rate of diffusion. In some cases it may be desirable to 
make the value of r different for short-term and long-term scent score intensity values. For example, making the value 
of r larger for short-term scent scores than for long-term scent scores would allow the short-term scent score values to 
propagate faster than long-term scent score values. In all cases, r must be less than or equal to 1 . 
[0041] An important condition that must be satisfied before propagating any scent score values from item A to item 
B is the number of items that have been identified as similar to item A and the number of unique user scent scores that 
already exist at item A. If the product of these two quantities is greater than a chosen threshold value, then no scent 
score will be propagated from item A. This is done to create a model wherein some items act as a sink for scent scores. 
Scent score sinks are generally information sources which are very generic or which serve as gateway/portal sites such 
as major search engines, corporate home pages and the like which many users have visited and from which little useful 
interest- related information may be derived. 

(4) Scent Match Score Generation 

[0042] With each user having both long-term and short-term scent scores associated with various items, the next 
step is to compute scent match scores for each pair of users. Scent match scores can be obtained by comparing the 
short-term scent scores of two users, the long-term scent scores of two users, or the short-term scent scores of one 
user against the long-term scent scores of another. The scent match scores are obtained through the equations below: 



55 
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SS Match* 



StOim 

SL Match* * — 
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20 [0043] Where: 

SS__Match ab is the match between short-term scent scores of users a and b; 

SL_Match ab is the match between the short-term scent score of user a and the long-term scent score of user b; 
LL_Match a fc> is the match between the long-term scent scores of users a and b; 
25 Stotp is the total number of distinct user scent scores that can be found at item p; 
SS ap is the short-term scent score scalar assigned to user a at item p; and 
SL ap is the long-term scent score scalar assigned to user a at item p. 

[0044] The above calculations are comparable to treating each user's scent score pattern as a very high-dimen- 
30 sional vector, and finding the cosine of the angle between each vector pair. The one distinction, however, is that the divi- 
sion by Stotp in the numerator sum provides a discount factor for scent scores that occur at items that are accessed by 
many users. This discounting prevents items that are relatively unrelated to any specific user interests from being 
counted in the match score. 

[0045] Although this method of correlation has been found useful in the context of the present invention, other cor- 
35 relation schemes may be used depending on the needs of the particular application and the preferences of the partic- 
ular designer. 

[0046] Once user matches are computed and stored in the match database 1 02, users may access these results 
to locate potential collaborators. In the preferred implementation, a match server 104 is used to provide multiple users 
access to the match database 102. The match server 104 uses a user's login name or an Internet protocol (IP) address 

40 of the requesting user's machine in order to identify the user within the match database 1 02. From the scent match 
scores computed for the requesting user, those with the highest values are used to select a set of potential collabora- 
tors. In order to understand why a certain individual has been identified as a potential collaborator, the user may exam- 
ine any given candidate that is presented to find information on (1) the items that both users have visited; (2) the items 
that the other user has visited that the requesting user has not, but which are close to the interests of the requesting 

45 user; (3) the items that the requesting user has visited but that the other has not, but which are close to the other user's 
apparent interests; and (4) items which neither user has visited, but which are close to the apparent interests of both 
users. 

[0047] Finally, a pruning operation may be performed in order to keep the match database 1 02 from growing to an 
unmanageable size. In this operation, entries that have little value for matching are eliminated by pruning all entries 

so where user scent scores fall below a certain threshold value due to decay. 

[0048] More detail of the entry processor 102, the match database 104, and the matcher 106 are given in FIG. 2. 
The entry processor 102 includes an information item type filter 200, an index engine 202, an item association engine 
204, and a scent update engine 206. The match database 104 includes an item index 208, a linkage table 210, a hit 
table 212, and a correlations table 214. The matcher 106 includes a decay engine 216, a link and hit counter 218, a 

55 diffusion engine 220, and a correlations engine 222. The information item type filter 200 receives incoming information 
about user activities from the monitor 100, including user identification information, an item identifier, a time code, and 
may include additional information useful for determining user interests. It then examines the item identifier to determine 
the type of item or the source of the item. It filters the items based on a particular criteria chosen to filter out unwanted 
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information items. For example, it may fitter by eliminating files of a particular type from a particular host or by allowing 
only certain file types to pass. The information item type filter 200 serves as a means for including only information 
sources of a desired type and acts by either accepting or rejecting particular items. 

[0049] When the information item type filter 200 accepts an information item, it passes the information gathered to 

5 the index engine 202, which interacts with the item index 208 of the match database 104 to create a new index entry 
for the item if it was not previously indexed or updates the index entry if the item was previously indexed. The index 
engine 202 assigns each new information item a unique identifier. The item association engine 204 receives information 
about user activities from the index engine 202 and examines the sequence of a user's access to information items. It 
generates linkages between pairs of items that are accessed during a relatively short period of time. Preferably, the item 

10 association engine 204 maintains a relatively short-term memory and stores the last two to three items a user has 
accessed. It examines the time between the current and the last few information item accesses. If the time has been 
sufficiently short, the item association engine 204 will create an association, in the form of a source item identifier and 
a destination identifier in the linkage table 210 of the match database 104. Each of the items in order within the 
sequence is assigned a certain linkage strength. Item pairs out of order within the sequence are assigned a weaker 

75 strength. For example, if three information sources A, B, and C have been accessed in that order, linkages are created 
for B to C and A to B with a certain linkage strength. Linkages may also be created for A to C, but with a weaker linkage 
strength. Although a two to three item storage is preferable, any number of items may be stored and linkages deter- 
mined for them by this method. In addition to forward linkages just described, reverse linkages may also be made, such 
as C to B, B to A, and C to A and assigned strengths as desired. 

20 [0050] The scent update engine 206 receives information from the item association engine 204 and updates the hit 
table 212 of the match database 1 04, assigning particular scent scores to a particular information source for a particular 
user 1 10. If the particular user 1 1 0 has not accessed the particular information source before, the scent update engine 
206 creates new scent score entries for that user 1 10 for the particular information source. Typically, the scent score 
entries include scent scores, user identification, a time stamp of the last hit on the item by the particular user 110, and 

25 an information item identifier, among other pieces of information. If the particular user 1 10 has previously visited the 
information source, the time stamp and scent scores are updated. As discussed previously, the long-term scent score 
is incremented upward at a slower rate than the short-term scent score, causing the short-term scent score to be more 
sensitive to recent activities. 

[0051] The tables included in the match database 1 04 include a item index 208, a linkage table 21 0, a hit table 212, 

so and a correlations table 214, and are displayed in FIGs. 3, 4, 5, and 6, respectively. The item index 208, as shown in 
FIG. 3 includes a unique item identifier, the source address for the item, the total number of visitors who have accessed 
the item, the number of scent score entries in the database for the item, the number of links from the item, the forward 
scan status, and the backward scan status. The forward scan status and the backward scan status are generally meth- 
ods of looking at information resources which refer to a given information source, or which the given information source 

35 refers to. Forward scanning involves examining the information sources targeted by links in the information source at 
hand. In this context, links may be items such as hyperlinks in a web page or bibliographic information in a particular 
document. Reverse scanning involves utilizing information organization resources such as search engines on the World 
Wide Web to find information resources that refer to the information resource at hand. By viewing information sources 
related by links to the information source at hand, it is possible to determine other potential information sources of inter- 

40 est to a given user 1 1 0. The forward scan status and backward scan status are established by a forward scanner and 
a reverse scanner, respectively, and are shown in the context of the present invention in FIG. 7. The entries for forward 
scan status and backward scan status may include the time stamp of the last scan or may simply indicate that scanning 
has taken place. The scanning process may take place only once after a item has been visited, or it may take place at 
specified intervals. Furthermore, the scanning processes may be utilized both with items that have actually been visited 

45 or it may extend to items not yet visited, but which have accumulated a scent score. The actual extent and timing of the 
scanning processes may be tailored to the particular application. The information gathered from the scanning process 
is used to update the linkage table. 

[0052] The linkage table 21 0 is shown in FIG. 4, and contains information including source item identification, des- 
tination item identification, and a linkage value for each item pair. The hit table 212 is shown in FIG. 5, and includes the 
user identification, the unique item identification, the time stamp of the last access event at that item, the short-term 
scent score for that item, and the long-term scent score for that item. The correlations table 214 is shown in FIG. 6, and 
includes the user identification for a first particular user, shown as "User ID X", the user identification for a second par- 
ticular user, shown as "User ID Y", the short-term scent match score between the users, the long-term scent match 
score between the users, and the long-term scent score to short-term scent score match between the users. Referring 
back to FIG. 2, the decay engine 216 of the matcher 1 06 operates by periodically decaying the entries in the item index 
208, the linkage table 210, and the hit table 212. In the linkage table 210, each linkage value is also decayed. As dis- 
cussed, this reduction may be by a specific percentage, a scalar value, or by other methods depending on the needs of 
the particular application. The decay engine 216 operates much the same way on the hit table 212, reducing the short- 
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term scent score and the long-term scent score. The particular reduction method or degree may vary for each of the 
items to be reduced, i.e. may be different for the decay of the short-term scent score than it is for the decay of the long- 
term scent score. If a short-term scent score or long-term scent score for a particular user corresponding to a particular 
item on the hit table 212 becomes decayed below a threshold value, the decay engine 216 may remove its entry in the 
5 hit table 212. If all scores for a particular item for all users on the hit table 212 become decayed below thethreshold, the 
decay engine 216 may prune its entry from the item index 208, and may also prune entries that incorporate it in the link- 
age table 21 0. In operation, the decay engine is not critical. However, it serves a cleanup function in order to eliminate 
unnecessary entries from the match database 104, to streamline the database size. 

[0053] The link and hit counter 21 8 of the matcher 106 provides a counting mechanism for each information item 

10 listed in the item index 208. It searches the linkage table 210 to determine the number of links from each item, and 
searches the hit table 212 to determine the number of users 110 who have visited a particular information source. The 
link and hit counter 218 provides a summary statistic in the item index 208 in order to keep track of the total number of 
users 1 1 0 who have visited the particular information item. The link and hit counter 21 8 also examines the hit table 21 2 
to determine the total number of scent scores for each information item and provides the total in the item index 208. 

is [0054] The diffusion engine 220 of the matcher 106 propagates the long-term scent scores and the short-term 
scent scores for a particular user from the hit table 212 to items that may be considered similar, via the linkage entries 
in the linkage table 210 by the method previously discussed for scent score diffusion. Entries for items to which the 
scent scores have been diffused are then either added to the hit table 212 or, if they already exist, are modified with 
their corresponding scent score values. The hit table 212 will not, however, register a time of last hit for the information 

20 items to which scent scores have propagated for a particular user 1 1 0, but which have not yet been visited by that par- 
ticular user 1 10. In this way, information sources that have actually been visited by a particular user 100 may be distin- 
guished from those that have not. Preferably, the diffusion engine 220 includes criteria that will prevent it from diffusing 
scent scores to certain items and item types. The criteria are necessary to prevent diffusion of scent scores to irrelevant 
information sources. For example, as discussed, it is undesirable to diffuse the scent scores through popular or general 

25 web pages, such as major corporate homepages or search engines, which have large traffic volumes, but which are not 
particularly useful for matching peoples' interests. The criteria for exclusion of certain information sources from the dif- 
fusion process may be set by examining variables for a particular item, such as the scent scores and the number of links 
to or from the item. 

[0055] The correlation engine 222 of the matcher 1 06 correlates the scent scores from the hit table 21 2 for pairs of 
30 users 1 1 0 and determines and updates the short-term match scores, the long-term match scores, and the long-term to 
short-term match scores for each pair of users. 

[0056] With regard to the system of FIG. 2 and the tables of FIGs. 3, 4, 5, and 6, it is important to note that many 
configurations may be developed utilizing the same general components. For example, the elements that comprise the 
entry processor 102, the match database 104, and the matcher 106 are somewhat arbitrarily grouped for clarity of 

35 explanation. In a particular embodiment, the grouping of elements may be much different than that presented in the 
drawings and described without having an appreciable effect on the system's functionality. More specifically, for exam- 
ple, the tables of the database 1 04 may be constructed such that the information collected is grouped differently among 
the tables. The importance lies in their use, not their specific embodiment, as the construction of the database will vary 
depending on such factors as the software used, the particular application, and the database developer. Similar varia- 

40 tions are both possible and likely for other components, including those presented in FIG. 7. 

[0057] FIG. 7 provides a diagram of the system of the present invention that includes privacy enhancements to 
allow users 1 1 0 to remain anonymous. The monitor 1 00, the entry processor 1 02, the match database 1 04, the matcher 
106, the match server 108, and the message server 1 12 are all elements shown in FIG. 1 and FIG. 2, which have been 
previously described. FIG. 7 also shows the interaction of the forward scanner 700 and the reverse scanner 702 with 

45 the match database 104. The forward scanner 700 examines the contents of information sources 706 to determine 
information sources to which they refer. In the case of the World Wide Web, for example, this process may take the form 
of following links that exist in the contents of a particular web page and distributing scent scores to their associated web 
pages. The reverse scanner 702 gathers information from external resources 704 such as search engines, indexes, and 
other resources that provide information about and organize information resources that include references to the infor- 

50 mation resource at hand. As discussed previously, information gathered through the forward scanner 700 and the 
reverse scanner 702 is used to diffuse the scent scores from information sources actually visited to information sources 
not yet visited, but which may contain information of interest. The forward scanner 700 and reverse scanner 702 are not 
critical components, but rather, are designed to enhance the diffusion process, and may be incorporated jointly or indi- 
vidually. 

55 [0058] The privacy enhancements 708 include a log scrubber 710 and ah anonymous user translation map 712. 
The log scrubber 710 removes the identity of the user 110 and replaces it with an arbitrary or anonymous name. The 
anonymous user translation map 712 provides a means for keeping track of the arbitrary or anonymous name associ- 
ated with a particular user 110. The log scrubber 710 utilizes the anonymous user translation map 712 to determine 
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whether a user 1 1 0 has previously been given an anonymous name, and if so it utilizes the same anonymous name for 
the current session. If the user 1 10 has not previously been given an anonymous name, a new one is generated and 
stored in the anonymous user translation map 712. The mirror proxy 714 acts as an information server, such as a Web 
server. However, when a user accesses the mirror proxy 714, it obtains their identification, finds the corresponding 
anonymous identification entry in the anonymous user translation map, queries the match server 108 with the anony- 
mous identification, and returns the results to the user. In other words, when a user has entered a request, the request 
is translated into a request containing the user's anonymous identification in place of their actual identification, and the 
results of the request are returned to the user. Note that the message server 1 12 is accessed through the match server 
1 08, thus allowing for anonymous messaging between users 1 1 0. Thus, users of the system can locate other users with 
similar interests, may exchange information, but need not reveal their true identity unless so desired. 
[0059] FIG. 8 is a flow chart generally outlining the steps provided by the present invention. As shown in the dia- 
gram, and as discussed relative to the system shown in FIG. 1, 2, and 7, the first step is to provide a user history as 
shown by box 800. The information for the user history may consist of historical data about the interaction of a plurality 
of users with a plurality of information items or it may be generated through real-time user interaction with an information 
resource such as the World Wide Web. The next step is to filter the information items in the user history in order to elim- 
inate those that are unlikely to provide useful information for collaborator discovery, as represented by box 802. These 
items, as mentioned previously, include items which are likely to be accessed by a broad base of people regardless of 
any common interests. Examples of items that fit into this category include major search engines on the Internet and 
major newspapers, magazines, or other publications in the case of a library system utilizing the present invention. The 
next step is to determine a similarity measure of the information items accessed, as shown by box 804. The similarity 
measure, as discussed previously, may be developed by a number of means such as the proximity of user accesses of 
several information items, user organization activities such as bookmarking web pages in a browser bookmark file and 
arranging items on the desktop of a computer, content analysis of information items, and express user similarity ratings. 
After generating similarity measures between information items, scent scores are associated between each particular 
information item and each user accessing the particular information item. The scent score association is represented 
by box 806. Next, utilizing the scent scores and the similarity measure between information items, the scent scores are 
diffused, or propagated, to other information items by generating a diffused scent score, derived from the scent score 
at the item from which the scent is to be diffused and the measure of similarity, and applying the derived scent score to 
the existing scent score of the item to which the scent score is diffused. The diffusion process is represented by box 
808. After the diffusion process, the scent scores for all information items for all users are correlated in order to deter- 
mine users who potentially share common interests, as represented by box 810. As shown by box 812, the results of 
the correlation may be provided to the users in order to assist them in finding collaborators. After correlation of the scent 
scores and extraction of relevant information during a particular iteration of the steps, the scent scores are decayed, as 
shown by box 112. The decay, as discussed previously, may take place linearly by a fixed amount for each iteration, or 
it may be performed by other methods. 

[0060] In accordance with the present invention, a specific embodiment has been developed using Microsoft 
Access™ 97, and is readily adaptable to other databases and database languages such as Dbase and SQL. The per- 
tinent details of this embodiment are discussed below. It is important to note that the description and code below is pre- 
sented for illustration and clarity, and that it focuses primarily on aspects of the invention that are best illustrated by 
example. Portions of the invention not described by the code are substantially as described in other areas of this spec- 
ification. Reference numbers will also be provided so that the details of the embodiment may be keyed to FIGs. 1 to 7 
and their respective descriptions. 

[0061] Prior to the creation of a new entry in the hit table 212, a decay query is run to reduce the short-term and 
long-term scent scores. The decay query, as was discussed previously, may be run periodically at preset intervals, or 
may be triggered by particular events. The code of the decay query is as follows: 

UPDATE DISTINCTROW hitTable AS H SET H.ST_Scent = [H].[ST_Scent]*0 5 
H.LT_Scent = [H].[LT_Scentr0.8; 

where hitTable represents the hit table 212, ST_Scent represents the short-term scent score, and LT_Scent represents 
the long-term scent score. In this case, the short-term and long-term scents are decayed through multiplication by sca- 
lar values of 0.5 and 0.8, respectively. 

[0062] Next, a new entry in the hit table 212 is created for the initial visit to a particular information item. The infor- 
mation for the hit table 21 2 is gathered from the monitor 100 and entered via the entry processor 1 02. After entries have 
been added, queries are performed to handle the diffusion and matching functions. 

[0063] The following query is first performed to remove entries from the linkage table 21 0 for information sources 
that are linked to themselves. 
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DELETE DISTINCTROW L Source Page ID 

FROM linkageTable AS L WHERE (((L.SourcePagelD)=[L].[DestPagelD])); 

where linkageTable is the linkage table 210, SourcePagelD is the Source Page Identification, and DestPagelD is the 
5 Destination Page Identification. 

[0064] After removing self-linked information sources from the linkage table 21 0, a first diffusion cycle is run. The 
code for the diffusion cycle includes several parts as follows: 

SELECT Count(hitTable.LastHit) AS Visitors, hitTable.PagelD, Count(hitTable.UserlD) AS 
w Scents INTO hitsPerPage FROM hitTable 
GROUP BY hitTable.PagelD; 

where hitTable is the hit table 212, LastHit is the Last Hit Time Stamp, PagelD is the Page Identification, and UserlD is 
the User Identification. The code counts the number of visitors to a particular information source and the number of 
15 users who have visited a particular information source. This information is stored in a temporary table, and then is trans- 
ferred from the temporary table into the page index table 208 by the code below. 

UPDATE DISTINCTROW hitsPerPage INNER JOIN pagelndex ON hits Per Page. PagelD = 
page Index. PagelD SET page Index. Visitors = [hitsPerPage].[Visitors], pagelndex.Scents = 
20 [hitsPerPage].[Scents] 

[0065] Next the linkage table 210 is updated by the following code. The results of the update are initially stored in 
a temporary table. 

25 SELECT DISTINCTROW linkTable.SourcePagelD, Count(linkTable.DestPagelD) AS 
Linkages INTO LinkCount FROM linkTable 
GROUP BY linkTable.SourcePagelD 

[0066] The results from the temporary table are then transferred into the page index table 208 by the code below. 

30 

UPDATE DISTINCTROW LinkCount INNER JOIN pagelndex ON 
LinkCount.SourcePagelD = pagelndex. PagelD SET page Index. Linkages = 
[LinkCount]. [Linkages] 

35 [0067] The actual diffusion step of the first diffusion cycle is performed by the next several portions of code. First, 
information sources that are linked to other information sources that have scent scores are collected. 

SELECT S.UserlD, L.DestPagelD, S.ST_Scent, S.LT_Scent, LLinkage, RScents INTO 
steplTable FROM (hitTable AS S INNER JOIN linkTable AS L ON S.PagelD = 
40 L.SourcePagelD) INNER JOIN pagelndex AS P ON S.PagelD = RPagelD 
WHERE {(([P].[Scentsr[P].[Linkages])<40)) 

[0068] Next, scents of zero value are inserted as placeholders in the hit table 212 for information sources that are 
to receive scents through diffusion. 

45 

INSERT INTO hitTable (UserlD, PagelD, ST_Scent, LT_Scent) SELECT DISTINCT 
L.UserlD, L.DestPagelD, 0 AS Exprt, 0 AS Expr2 FROM steplTable AS L WHERE 
(((Exists (SELECT H2.PagelD FROM hitTable AS H2 WHERE H2.PagelD = L.DestPagelD 
AND H2.UserlD = L.UserlD))=False)) 

50 

[0069] Next, values are calculated for an intermediate table, named step2table, utilizing an approximation of the 
scent update formula described previously and represented by the following code. 

SELECT DISTINCTROW S.UserlD, S.DestPagelD, Avg(([S].[ST_Scent]- 
55 [D].[ST_Scent])*[S].[Linkage]) AS ST_dev, Avg(([S].[LT_Scent]- 

[D].[LT_Scent])*[S].[Linkage]) AS LT_dev, Count(S. DestPagelD) AS Sources, D.ST_Scent 
AS currentST, D.LT_Scent AS currentLT, (1-([Sources]*[ST_dev])+(([Sources]- 
1)*[Sources]*[ST^dev]*[ST_dev]/2))*([currentST]-1)+1 AS ST_new, (1- 
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([SourcesnLT_dev])+(([Sources]-^ 

AS LT_new INTO step2Table FROM steplTable AS S INNER JOIN hitTable AS D ON 
(S.DestPagelD = D.PagelD) AND (S.UserlD = D.UserlD) 
WHERE (((S.Scents)>0) AND ((S.LT_Scent)>[D].[LT_Scent])) 
GROUP BY S.UserlD, S.DestPagelD, D.ST_Scent, D.LT_Scent 

[0070] Next, the results from the intermediate step2table are transferred into the hit table 21 2 by the following code. 

UPDATE DISTINCTROW hitTable AS H, step2Table AS new SET H.ST_Scent = 
[new].[ST_new], H.LT_Scent = [new].[LT_new] WHERE (({H.UserlD)=[new].[UserlDJ) 
AND «H.PagelD)=[new].[DestPagelD])) 

[0071 ] The code steps above complete the first diffusion cycle. Subsequently, a second diffusion cycle is performed 
by several steps, which are set forth below, along with the appropriate code for each. As in the first diffusion cycle the 
code below counts the number of visitors to a particular information source and the number of users who have visited 
a particular information source. 

SELECT Count(hitTable.LastHit) AS Visitors, hitTable.PagelD, Count(hitTable.UserlD) AS 
Scents INTO hitsPerPage FROM hitTable 
GROUP BY hitTable.PagelD 

[0072] The information just obtained is stored in a temporary table, and then is transferred from the temporary table 
into the page index table 208 by the code below. 

UPDATE DISTINCTROW hitsPerPage INNER JOIN pagelndex ON hitsPerPage.PagelD = 
pagelndex.PagelD SET pagelndex.Visitors = [hitsPerPage].[Visitors], pagelndex.Scents = 
[hitsPerPage].[Scents] 

[0073] The actual diffusion step of the second diffusion cycle is performed by the next several portions of code. 
First, information sources that are linked to other information sources that have scent scores are collected. 

SELECT S.UserlD, LDestPagelD, S.ST_Scent, S.LT_Scent, LLinkage, RScents INTO 
steplTable FROM (hitTable AS S INNER JOIN linkTable AS L ON S.PagelD = 
L.SourcePagelD) INNER JOIN pagelndex AS P ON S.PagelD = P.PagelD WHERE 
((([P].[Scentsr[P].[Linkages])<40)) 

[0074] Next, scents of zero value are inserted as placeholders in the hit table 212 for information sources that are 
to receive scents through diffusion. 

INSERT INTO hitTable (UserlD, PagelD, ST_Scent, LT_Scent) 
SELECT DISTINCT LUserlD, LDestPagelD, 0 AS ExpM , 0 AS Expr2 
FROM steplTable AS L 

WHERE (((Exists (SELECT H2.PagelD FROM hitTable AS H2 
WHERE H2.PagelD = LDestPagelD AND 
H2.UserlD = LUseriD))=False)) 

[0075] Next, values are calculated for an intermediate table, named step2table, utilizing an approximation of the 
scent update formula described previously and represented by the following code. 

SELECT DISTINCTROW S.UserlD, S.DestPagelD, Avg(([S].[ST_Scent]- 
[D].[ST_Scent])*[S].[Linkage]) AS ST_dev, Avg(([S].[LT_Scent]- 

[D].[LT_Scent])*[S].[Linkage]) AS LT_dev, Count(S.DestPagelD) AS Sources, D.ST_Scent 
AS currentST, D.LT_Scent AS currentLT, (1 -([Sources]*[ST_dev])+(([Sources]- 
1 )*[Sourcesr[ST_devr[ST_dev]/2))*([currentST]-1 )+1 AS ST_new, (1 - 
([Sources]*[LT_dev])+(([Sources]-^ 
AS LT_new INTO step2Tab!e 

FROM steplTable AS S INNER JOIN hitTable AS D ON (S.DestPagelD = D PagelD) AND 
(S.UserlD = D.UserlD) 
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WHERE (((S.Scents)>0) AND ((S.LT_Scent)>[D].[LT_Scent])) 
GROUP BY S.UserlD, S.DestPagelD, D.ST.Scent, D.LT_Scent 

[0076] Next, the results from the intermediate step2table are transferred into the hit table 21 2 by the following code. 

5 

UPDATE DISTINCTROW hitTable AS H, step2Table AS new SET H.ST_Scent = 
[new].[ST_new], H.LT_Scent = [new].[LT_newJ 

WHERE (((H.UserlD)=[new].[UserlDJ) AND ( (H.Page I D)=[new].[Dest Page ID])) 

10 [0077] In order to correlate users to determine potential user matches based on interests, the following steps are 
performed in the specific embodiment. 

(1) The hit counts are updated 

(2) The dot product of each user's scents with each other user's scents is taken, utilizing the total number of scents 
75 at each information item as the divisor. This operation is generated by the following code. 

SELECT T.UserlD, S.UserlD, Sum([T].[ST_Scent]*[S].[ST_Scent]/( P. Scents)) AS ST_Sum, 
Sum(T.LT_Scent*S.LT_Scent/(RScents)) AS LT_Sum, 
Sum(TLT_Scent*S.ST_Scent/(P.Scents)) AS LT_ST INTO correlationsTable 
20 FROM hitTable AS T, hitTable AS S, pagelndex AS P WHERE (((TPagelD)=[S].[PagelD] 

And (TPagelD)=[PMPagelD]) AND ((RScents)>0)) GROUP BY T.UserlD, S.UserlD 

(3) Next, normalizing terms for each user are determined in order to reduce the individual user's match if the indi- 
vidual user has many strong scents. 

25 

SELECT DISTINCTROW H.UserlD, Sqr(Sum(H.ST_Scent * H.ST_Scent) + 1) AS 
ST_norm, Sqr(Sum(H.LT_Scent * H.LT_Scent) + 1) AS LT_norm INTO userNorms 
FROM hitTable AS H GROUP BY H.UserlD 

30 (4) Next, the scores in the correlations table 214 are divided by a product of the user's norms for each pair of users 
in order to produce the final resulting user match scores. This operation is carried out by the following two code 
blocks. 

SELECT A.S_UserlD, AT.UserlD, A.ST_Sum/ST_norm AS ST, A.LT_Sum/LT_norm AS 
35 LT, A.LT_ST/LT_ST_norm AS LT_ST INTO norm4resUltsTemp 

FROM correlationsTable AS A INNER JOIN correlX4 AS B ON (A.S_UserlD = 
B.SJJserlD) AND (A.TJJserlD = B.T_UserlD) 
WHERE (((A.T_UserlD)0[A].[S_UserlD])); 

40 correlX4: SELECT A.UserlD AS SJJserlD, B.UserlD AS T.UserlD, 

A.ST_norm*B.ST_norm AS ST_norm, A.LT_norm*B.LT_norm AS LT_norm, 
A.LT_norm*B.ST_norm AS LT_ST_norm 
FROM userNorms AS A, userNorms AS B. 

45 [0078] The final match results are then available within the norm4resultsTemp table, from which they may be 
accessed by users. 

Claims 

so 1 . A method for collaborator discovery among a plurality of users including the steps of: 

(a) providing a user history including a plurality of entries, with each entry including a user identity associated 
with each particular user and a reference to a particular item accessed by that user; 

(b) associating particular items in the user history by providing a measure of similarity between the particular 
55 items; 

(c) uniquely associating at least one scent score to each particular item accessed by a particular user; 

(d) diffusing the at least one scent score associated with a particular item accessed by a particular user to 
another item by generating at least one diffusion scent score from the combination of the measure of similarity 
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between the particular item and the other item and the at least one scent score, and incorporating the at least 
one diffusion scent score into the at least one scent score of the other item; 

(e) repeating step (d) for all items which have at least one scent score; and 

(f) determining scent match scores by correlating the scent scores from all of the particular items to find users 
with common interests. 

2. A method for collaborator discovery among a plurality of users as set forth in claim 1 , wherein the user history is 
generated by monitoring and recording the real-time accesses of the plurality of users, and wherein steps (b) 
through (f) of claim 1 are repeated a plurality of times to provide a continual update of the scent scores. 

3. A method for collaborator discovery among a plurality of users as set forth in claim 1 , further including the step of 
filtering the items in the user history using a criteria to eliminate undesirable items from the user history. 

4. A method for collaborator discovery among a plurality of users as set forth in claim 1 , wherein the measure of sim- 
ilarity between the particular items associated in step (b) is based on the temporal proximity of access between the 
particular items. 

5. A method for collaborator discovery among a plurality of users as set forth in claim 1 , wherein the scent scores are 
represented by scalar values, and wherein a step of increasing the scent scores in proportion to the number of 
times a particular item is accessed by a particular user is added between steps (c) and (d). 

6. A method for collaborator discovery among a plurality of users as set forth in claim 1 , further including the step of 
providing each of the plurality of users with information regarding the correlation of their scent scores with the scent 
scores of others of the plurality of users after step (f). 

7. A method for collaborator discovery among a plurality of users as set forth in claim 1 , wherein the plurality of users 
are provided with anonymous identifications. 

8. A method for collaborator discovery among a plurality of users as set forth in claim 1 , wherein each of the plurality 
of users is provided a method for messaging to allow interaction between the plurality of users. 

9. A method for collaborator discovery among a plurality of users as set forth in claim 2, wherein items accessed 
include Internet web pages. 

10. A method for collaborator discovery among a plurality of users as set forth in claim 2, wherein the scent scores are 
decayed over time. 

11 . A method for collaborator discovery among a plurality of users as set forth in claim 2, wherein the at least one scent 
score for each particular user and information item includes a short-term scent score and a long-term scent score, 
and where, for each subsequent access of each particular item by a particular user, the short-term scent score and 
long-term scent score are increased in proportion to the number of accesses by the particular user such that the 
short-term scent score increases more rapidly than the long-term scent score. 

12. A method for collaborator discovery among a plurality of users as set forth in claim 5, wherein a maximum scent 
score value is set such that when a particular scent score reaches the maximum scent score value, it ceases to 
increase. 

13. A method for collaborator discovery among a plurality of users as set forth in claim 12, wherein the long-term scent 
scores and the short-term scent scores are decayed over time with a decay rate such that the long-term scent 
scores are decayed more slowly than the short-term scent scores. 

14. A method for collaborator discovery among a plurality of users as set forth in claim 13, 

a. wherein the short-term scent score and long-term scent scores are associated with each particular user 
according to the following, 

SS= CS 
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SL = CL 

wherein SS represents the short-term scent score, SL represents the long-term scent score, and CS and CL 
are scalar values chosen as scent score values assigned for the first access of a particular item by a particular 
5 user; 

b. wherein the short-term scent score and the long-term scent score are increased according to the following, 

SS = SS + (1 — SS) * KS and 

w SL = SL + (1 — SL) * KL, wherein 

SS represents the short-term scent score, SL represents the long-term scent score, 
KS and KL represent incrementing rates chosen such that KS > KL\ 

is c. wherein the decay is performed according to the following, 

SS = SS* DS and 

SL - SL* DL, wherein 

20 

SS represents the short-term scent score, SL represents the long-term scent score, 
DS and DL represent decay rates chosen such that DS < DL. 

15. A method for collaborator discovery among a plurality of users as set forth in claim 1 4, wherein the item from which 
25 the scent score is diffused is identified as a source item A and the item to which the scent score is diffused is iden- 
tified as a destination item B, and the scent score diffusion is performed according to, 

ifSS A > SS g : SS' e = SS B + {SS A ~SS B )* L AB * r s , and 

30 if SL A > SL B : SL* B = SL B + (SL A — SL B ) * L AB * r L , wherein 

SS A represents the short-term scent for a particular user at the source item A, 

SS B represents the short-term scent for a particular user at the destination item B, 

35 

SL A represents the long-term scent for a particular user at the source item A, 

SL B represents the long-term scent for a particular user at the destination item B, 

40 L AB represents the measure of similarity between the source item A and the destination item B, r s provides a 

short-term scent diffusion rate, and r L provides a long-term scent diffusion rate. 

16. A method for collaborator discovery among a plurality of users as set forth in claim 15, wherein the correlation of 
the scent scores between user a, representing a particular one of the plurality of users, and user b, representing 

45 another of the plurality of users, where item p represents a particular one of the plurality of items, is performed by 
the following, 
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SS_Match ah = -1 St0t " 



SL^Match^ =-2 — >a nd 



ZX _Match ai = -Z ^ — , where 



SS_Match ab is the match between short-term scent scores of user a and user b; 

SL_Match ab is the match between the short-term scent score of user a and the long-term scent score of user 

D\ ' . 

LL_Match ab is the match between the long-term scent scores of users a and b; 
Stotp is the total number of distinct user scent scores that can be found at item p; 
SS ap is the short-term scent score assigned to user a at item p; and 
SL ap is the long-term scent score assigned to user a at item p. 

17. A system for collaborator discovery among a plurality of users including: 

a. a monitor which provides a user history, said user history including a plurality of entries, with the plurality of 
entries including a user identity associated with each particular user and a reference to a particular item 
accessed by that user, 

b. an entry processor connected to the monitor to receive the plurality of entries of the user history from the 
monitor, said entry processor operative to associate pairs of particular items in the user history to provide a 
measure of similarity for each pair of particular items, and to uniquely associate at least one scent score for 
each particular item accessed by a particular user; 

c. a match database connected to the entry processor to receive and store the measure of similarity and the 
at least one scent score; 

d. a matcher connected to the match database to receive the measure of similarity and the at least one scent 
score, and to diffuse the at least one scent score to other particular items in the user history in proportion to 
the measure of similarity and to correlate the scent scores of all of the particular items in the user history to 
determine users with common interests. 

18. A system for collaborator discovery among a plurality of users as set forth in claim 1 7, wherein the user history pro- 
vided by the monitor is generated from information regarding the real-time activities of users with respect to a plu- 
rality of items. , K 

1 9. A system for collaborator discovery among a plurality of users as set forth in claim 1 7, further including means for 
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providing user anonymity. 

20. A system for collaborator discovery among a plurality of users as set forth in claim 17, further including means to 
allow for interaction between users. 

5 

21 . A system for collaborator discovery among a plurality of users as set forth in claim 1 7, wherein the entry processor 
includes an information item type fitter operative to eliminate unimportant entries upon receipt from the monitor to 
provide a plurality of filtered entries. 

10 22. A system for collaborator discovery among a plurality of users as set forth in claim 1 7, wherein, 

a. the entry processor further includes an item association engine, said item association engine linked to the 
information item type filter to receive the plurality of filtered entries therefrom and to assign a measure of sim- 
ilarity for each pair of particular items; and 
15 b. the match database further includes means to receive and store the measure of similarity for each pair of 

particular items from the entry processor. 

23. A system for collaborator discovery among a plurality of users as set forth in claim 1 7, wherein the users of the sys- 
tem access the system by computer and wherein the monitor is distributed across the computers used by the users. 

20 

24. A system for collaborator discovery among a plurality of users as set forth in claim 17, wherein the monitor is cen- 
tralized. 

25. A system for collaborator discovery among a plurality of users as set forth in claim 18, wherein, 

25 

a. the entry processor further includes a scent update engine which receives the plurality of filtered entries from 
the filter and uniquely associates at least one scent score for each particular item accessed by a particular 
user; and 

b. the match database further includes means to receive and store the at least one scent score for each par- 
30 ticular item accessed by a particular user from the scent update engine. 

26. A system for collaborator discovery among a plurality of users as set forth in claim 1 8, wherein the matcher further 
includes a diffusion engine linked to the match database to receive the measure of similarity for each pair of partic- 
ular items and to receive the scent score corresponding to at least one particular item of the pair of particular items 

35 for which the measures of similarity were received, and further to utilize the measures of similarity and the at least 
one scent score to diffuse the at least one scent score from one item of the pair of particular items to the other item 
of the pair of particular items to generate a diffused scent score, and to incorporate the diffused scent score into 
the match database. 

40 27. A system for collaborator discovery among a plurality of users as set forth in claim 1 7, wherein the scent scores are 
increased in proportion to the number of times a particular item is accessed by a particular user. 

28. A system for collaborator discovery among a plurality of users as set forth in claim 19, wherein the users of the sys- 
tem access the system by computer and wherein the means for providing user anonymity is distributed across the 

45 computers used by the users. 

29. A system for collaborator discovery among a plurality of users as set forth in claim 19, wherein the means for pro- 
viding user anonymity is centralized. 

so 30. A system for collaborator discovery among a plurality of users as set forth in claim 20, wherein the means to allow 
for interaction between users is a chat system. 

31. A system for collaborator discovery among a plurality of users as set forth in claim 20, wherein the means to allow 
for interaction between users is an e-mail system. 

55 

32. A system for collaborator discovery among a plurality of users as set forth in claim 27, wherein the matcher further 
includes a decay engine linked to the match database decay the measure of similarity and the at least one scent 
score for each particular item for each user overtime. 
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33. A system for collaborator discovery among a plurality of users as set forth in claim 27, wherein the at least one 
scent score associated for each particular user and information item by the entry processor includes a short-term 
scent score and a long-term scent score where, for each subsequent access of each particular item by a particular 
user, the entry processor increases the short-term scent score and long-term scent score in proportion to the 
number of accesses of a particular item by the particular user such that the short-term scent score increases more 
rapidly than the long-term scent score. 

34. A system for collaborator discovery among a plurality of users as set forth in claim 27, wherein a maximum scent 
score value is set such that when a particular scent score reaches the maximum scent score value, it ceases to 
increase. 

35. A system for collaborator discovery among a plurality of users as set forth in claim 28, wherein the decay engine 
includes a decay rate used to decay the long-term scent scores and the short-term scent scores over time with the 
decay rate chosen such that the long-term scent scores decay more slowly than the short-term scent scores. 

36. A system for collaborator discovery among a plurality of users as set forth in claim 35, 

a. wherein the entry processor associates short-term scent score and long-term scent scores with each par- 
ticular user according to the following, 

SS=CS 
SL = CL 

wherein SS represents the short-term scent score, SL represents the long-term scent score, and CS and CL 
are scalar values chosen as scent score values assigned for the first access of a particular item by a particular 
user; 

b. wherein the short-term scent score and the long-term scent score are increased according to the following, 

SS = SS + (1 — SS) * KS and 

SL = SL + (1 — SL) * KL t wherein 

SS represents the short-term scent score, SL represents the long-term scent score, 
KS and KL represent incrementing rates chosen such that KS > KL; 

d. wherein the decay is performed according to the following, 

SS = SS* DS and 

SL = SL * DL t wherein 

SS represents the short-term scent score, SL represents the long-term scent score, 
DS and DL represent decay rates chosen such that DS < DL. 

57. A system for collaborator discovery among a plurality of users as set forth in claim 36, wherein the item from which 
the scent score is diffused is identified as a source item A and the item to which the scent score is diffused is iden- 
tified as a destination item B, and the scent score diffusion is performed by the diffusion engine according to, 

ifSS A > SS B :SS' B = SS B + (SS A - SS B ) * L AB * r s> and 

ifSL A > SL B :SL' B = SL B + (SL A — SL B )'L AB 'r L , wherein 

SS A represents the short-term scent for a particular user at the source item A, 

SS B represents the short-term scent for a particular user at the destination item B, 

SL A represents the long-term scent for a particular user at the source item A, 
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10 



SL 8 represents the long-term scent for a particular user at the destination item B, 

L AB represents the measure of similarity between the source item A and the destination item B, r s provides a 
short-term scent diffusion rate, and r L provides a long-term scent diffusion rate. 

38. A system for collaborator discovery among a plurality of users as set forth in claim 37, wherein the matcher corre- 
lates the scent scores between user a, representing a particular one of the plurality of users, and user £>, represent- 
ing another of the plurality of users, where item p represents a particular one of the plurality of items, according to 
the following, 



Z 



ap bp 



Stot 0 

SS Match u = - r - 



. St0t o 

SL_Match. = - , and 



mm. 



LL Match^ = — — , where 



SS_Match a |j is the match between short-term scent scores of user a and user b\ 
40 SL_Match ab is the match between the short-term scent score of user a and the long-term scent score of user 

b\ 

LL_Match ab is the match between the long-term scent scores of users a and b\ 
Stotp is the total number of distinct user scent scores that can be found at item p; 
SS ap is the short-term scent score assigned to user a at item p; and 
45 SL ap is the long-term scent score assigned to user a at item p. 
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